Dimodal Model Tests: Significance models of features in the low-pass spacing.

Description

Return the probability of the characteristic feature value, the height of a peak or length of a flat, using parametric models developed for the low-pass spacing, or determine the feature value at some significance level.

Usage

Dipeak.test(ht, n, flp, filter, lower.tail=TRUE)
Dipeak.critval(pval, n, flp, filter)
Diflat.test(len, n, flp, filter, basedist, lower.tail=TRUE)
Diflat.critval(pval, n, flp, filter, basedist)

Value

Dipeak.test and Diflat.test return lists of class "Ditest"

with elements

method: a string describing the test
statfn: function used to evaluate significance level/probability
statistic: what is tested, the height of the peak or length of the flat
statname: text string describing the statistic
parameter: distributional arguments, for the peak the corrected height corrht and the mu and lambda for the inverse Gaussian; omitted for flats
p.value: probability of feature
alternative: a string describing the direction of the test vs. the null distribution
model: parameters for the feature model, n the data size, flp the low-pass filter size as a fraction of n, filter the low-pass kernel, and for flats basedist the distribution used to build the model

statistic and p.value will have the length of the ht

or len argument. NA and NaN values in the first argument will propagate to p.value, NULL produces an empty vector, and non-numeric values an NA. If pval is less than 0 or greater than 1 the p.value is NaN.

Arguments

ht: difference(s) between the standardized data value at the peak and deepest minimum to either side
len: length(s) of flat in data points
pval: the significance level(s) to find the corresponding height or length, the quantile of the feature value is 1-pval
n: number of data points before filtering
flp: the size of the FIR kernel, either as a fraction of n or as an integer
filter: the FIR kernel used to smooth the spacing
basedist: for the flat models, the distribution used to generate the length quantiles
lower.tail: a boolean, if TRUE the test returns the probability the null distribution is less than or equal to the feature value, if FALSE greater than

Details

The test functions convert the feature value into a quantile or significance level based on null distribution models. The critval functions do the opposite. The models are parametric because they are built on draws of specifically chosen variates and the size of features that appear after low-pass filtering the data. The features depend on the size of the draw n and the smoothing done, set by the Finite Impulse Response (FIR) filter and the size flp of the kernel. Implicitly they depend on the feature detectors, but variations in the parameters controlling those have neither been studied nor incorporated in the model.

The peak height model comes from draws of an asymmetric Weibull variate with scale 2 and shape 4, which proved to give reasonable, conservative quantiles against other distributions. The preferred filter uses a Kaiser kernel. The other filters available, the Bartlett or triangular (synonyms), Hanning, Hamming, Gaussian or normal (synonyms), and Blackman kernel, are handled by scaling the Kaiser model. The filter size is typically expressed as a fraction of the draw size, with flp=0.15 a good default; spans in data points are also accepted. Smaller kernels will produce rougher data with more peaks and fewer flats and can be tolerated if the spacing is already smooth, as happens with very large data sets. The test height for the model is scaled by the standard deviation of the total signal.

The peak test models the distribution of heights with an inverse Gaussian, a.k.a. Wald distribution. The height is corrected for the filter and its size, and the inverse Gaussian location and scale parameters depend on the data and filter sizes. These values are provided in the returned list.

The flat length model varies much more with the parametric distribution chosen as the base, and the recommended basedist, a logistic variate, is a compromise. Models for normal or Gaussian (synonyms), Gumbel, and Weibull distributions are also available, but there is little overlap between the quantiles of lengths within them; the logistic falls in the middle. The Weibull variant is more liberal, accepting lengths that are two-thirds those needed to pass at the same level as the logistic. The Gumbel lengths are four-thirds longer. The filter type, size, and draw size are the same as for the peak height model. Unlike the peak model, different filters require different models internally.

The length distribution varies smoothly with the data size and filter, and the flat model can calculate the probability directly without going through a distribution function.

The models come from simulations over the ranges n = 50 ... 500 and flp = 0.05 ... 0.5, measuring quantiles between q = 0.90 ... 0.99999. They fit the critical values within 5% over most of these values, degrading to 10% at the edges. The spread in the reported probability also increases at the edges of the parameter space. In particular, data sets of less than 60 points or windows larger than 30% are less trustworthy, as are quantiles beyond 0.9999. The models will generate a warning under these conditions and a tighter significance level should be used to judged the results. For data sizes much beyond 500, it is better to switch to the normal or Weibull base distribution when testing flats.

Bad values passed for the draw and LP kernel sizes will raise errors. The filter name will default to Kaiser if the argument does not match a supported kernel or if it is a bad value (NA, empty, or non-character). The base distribution similarly defaults to the logistic. The arguments correspond to options "lp.kernel", "lp.window" or "diw.window", and "flat.distrib". The probabilities should be evaluated against "alpha.ht" and "alpha.len" for the minimum passing level.

All four functions can take vectors as their first argument, which are evaluated one by one for the given filter and draw set-up.

Examples

Run this code

pval <- Dipeak.test(0.25*(1:16), 200, 0.15,'kaiser', lower.tail=FALSE)
pval$p.value
## Recovers pval.
Dipeak.critval(pval$p.value, 200, 0.15,'kaiser')

pval <- Diflat.test(10*(1:12), 200, 0.15,'kaiser', 'logistic', lower.tail=FALSE)
pval$p.value
Diflat.critval(pval$p.value, 200, 0.15,'kaiser', 'logistic')